Goto

Collaborating Authors

 efficient active learning


Efficient Active Learning with Abstention

Neural Information Processing Systems

The goal of active learning is to achieve the same accuracy achievable by passive learning, while using much fewer labels. Exponential savings in terms of label complexity have been proved in very special cases, but fundamental lower bounds show that such improvements are impossible in general. This suggests a need to explore alternative goals for active learning. Learning with abstention is one such alternative. In this setting, the active learning algorithm may abstain from prediction and incur an error that is marginally smaller than random guessing.


Efficient Active Learning for Gaussian Process Classification by Error Reduction

Neural Information Processing Systems

Active learning sequentially selects the best instance for labeling by optimizing an acquisition function to enhance data/label efficiency. The selection can be either from a discrete instance set (pool-based scenario) or a continuous instance space (query synthesis scenario). In this work, we study both active learning scenarios for Gaussian Process Classification (GPC). The existing active learning strategies that maximize the Estimated Error Reduction (EER) aim at reducing the classification error after training with the new acquired instance in a one-step-look-ahead manner. The computation of EER-based acquisition functions is typically prohibitive as it requires retraining the GPC with every new query.


Efficient active learning of sparse halfspaces with arbitrary bounded noise

Neural Information Processing Systems

We study active learning of homogeneous $s$-sparse halfspaces in $\mathbb{R}^d$ under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most $\eta$ for a parameter $\eta \in \big[0, \frac12\big)$, known as the bounded noise. Even in the presence of mild label noise, i.e. $\eta$ is a small constant, this is a challenging problem and only recently have label complexity bounds of the form $\tilde{O}(s \cdot polylog(d, \frac{1}{\epsilon}))$ been established in [Zhang 2018] for computationally efficient algorithms. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse: the best known result [Awasthi et al. 2016] provides a computationally efficient algorithm with label complexity $\tilde{O}((s ln d/\epsilon)^{poly(1/(1-2\eta))})$, which is label-efficient only when the noise rate $\eta$ is a fixed constant.



Review for NeurIPS paper: Efficient active learning of sparse halfspaces with arbitrary bounded noise

Neural Information Processing Systems

Summary and Contributions: This paper studies the problem of learning sparse halfspaces given access to a noisy point-label pair oracle. In particular, given underlying true halfspace h *, the goal is to recover an \epsilon accurate sparse representation h' of h * using minimum number of noisy-oracle queries. The paper makes the following standard assumptions (i) the underlying distribution over the points is log-concave isotropic (ii) The label noise model is Massart noise. Under these assumptions, the paper gives an efficient algorithm which \epsilon learns halfspaces using O(s/(1 - 2\eta) 4 \poly-log(d,\epsilon)) samples, making it the first linear in s-sample complexity algorithm in this setting. This is also known to be almost information theoretically optimal with the upper and lower bounds differing only by a factor of O(1/(1 - 2\eta) 2).


Review for NeurIPS paper: Efficient active learning of sparse halfspaces with arbitrary bounded noise

Neural Information Processing Systems

All reviewers agree that this paper made a solid contribution in the context of active learning of sparse halfspaces (in the Massart noise model). The sample complexity bound amounts to a major improvement over best known results for learning of halfspaces, which warrant acceptance of the paper. For camera-ready, the authors are encouraged to take into account the reviewer's feedback to further improve the discussion of the proposed algorithm (In particular, please address the concern on lack of intuition why a mirror descent approach leads to such large improvements in this space compared to earlier methods).


Efficient Active Learning with Abstention

Neural Information Processing Systems

The goal of active learning is to achieve the same accuracy achievable by passive learning, while using much fewer labels. Exponential savings in terms of label complexity have been proved in very special cases, but fundamental lower bounds show that such improvements are impossible in general. This suggests a need to explore alternative goals for active learning. Learning with abstention is one such alternative. In this setting, the active learning algorithm may abstain from prediction and incur an error that is marginally smaller than random guessing.


Efficient Active Learning for Gaussian Process Classification by Error Reduction

Neural Information Processing Systems

Active learning sequentially selects the best instance for labeling by optimizing an acquisition function to enhance data/label efficiency. The selection can be either from a discrete instance set (pool-based scenario) or a continuous instance space (query synthesis scenario). In this work, we study both active learning scenarios for Gaussian Process Classification (GPC). The existing active learning strategies that maximize the Estimated Error Reduction (EER) aim at reducing the classification error after training with the new acquired instance in a one-step-look-ahead manner. The computation of EER-based acquisition functions is typically prohibitive as it requires retraining the GPC with every new query.


Efficient active learning of sparse halfspaces with arbitrary bounded noise

Neural Information Processing Systems

We study active learning of homogeneous s -sparse halfspaces in \mathbb{R} d under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most \eta for a parameter \eta \in \big[0, \frac12\big), known as the bounded noise. Even in the presence of mild label noise, i.e. \eta is a small constant, this is a challenging problem and only recently have label complexity bounds of the form \tilde{O}(s \cdot polylog(d, \frac{1}{\epsilon})) been established in [Zhang 2018] for computationally efficient algorithms. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse: the best known result [Awasthi et al. 2016] provides a computationally efficient algorithm with label complexity \tilde{O}((s ln d/\epsilon) {poly(1/(1-2\eta))}), which is label-efficient only when the noise rate \eta is a fixed constant. In this work, we substantially improve on it by designing a polynomial time algorithm for active learning of s -sparse halfspaces, with a label complexity of \tilde{O}\big(\frac{s}{(1-2\eta) 4} polylog (d, \frac 1 \epsilon) \big) . This is the first efficient algorithm with label complexity polynomial in \frac{1}{1-2\eta} in this setting, which is label-efficient even for \eta arbitrarily close to \frac12 .


Towards Efficient Active Learning in NLP via Pretrained Representations

Vysogorets, Artem, Gopal, Achintya

arXiv.org Artificial Intelligence

Fine-tuning Large Language Models (LLMs) is now a common approach for text classification in a wide range of applications. When labeled documents are scarce, active learning helps save annotation efforts but requires retraining of massive models on each acquisition iteration. We drastically expedite this process by using pretrained representations of LLMs within the active learning loop and, once the desired amount of labeled data is acquired, fine-tuning that or even a different pretrained LLM on this labeled data to achieve the best performance. As verified on common text classification benchmarks with pretrained BERT and RoBERTa as the backbone, our strategy yields similar performance to fine-tuning all the way through the active learning loop but is orders of magnitude less computationally expensive. The data acquired with our procedure generalizes across pretrained networks, allowing flexibility in choosing the final model or updating it as newer versions get released.